27 research outputs found

    Non-Compositional Term Dependence for Information Retrieval

    Full text link
    Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

    Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

    Get PDF
    International audienceDocument coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where adjusting the ranking of documents according to both their relevance and their coherence has been shown to increase retrieval effectiveness.The state of the art in unsupervised coherence modelling represents documents as bipartite graphs of sentences and discourse entities, and then projects these bipartite graphs into one–mode undirected graphs. However, one–mode projections may incur significant loss of the information present in the original bipartite structure. To address this we present three novel graph metrics that compute document coherence on the original bipartite graph of sentences and entities. Evaluation on standard settings shows that: (i) one of our coherence metrics beats the state of the art in terms of coherence accuracy; and (ii) all three of our coherence metrics improve retrieval effectiveness because, as closer analysis reveals, they capture aspects of document quality that go undetected by both keyword-based standard ranking and by spam filtering. This work contributes document coherence metrics that are theoretically principled, parameter-free, and useful to IR

    Immobilization of Cesium Traps from the BN-350 Fast Reactor (Aktau, Kazakhstan) -11062

    No full text
    ABSTRACT During BN-350 reactor operations and also during the initial stages of decommissioning, cesium traps were used to decontaminate the reactor's primary sodium coolant. Two different types of carbon-based trap were used -the MAVR 1 series, low ash granulated graphite adsorber (LAG) contained in a carrier designed to be inserted into the reactor core during shutdown; and a series of ex-reactor trap accumulators (TAs) which used reticulated vitreous carbon (RVC) to reduce Cs-137 levels in the sodium after final reactor shutdown. In total four MAVRs and seven TAs were used at BN-350 to remove an estimated cumulative 755 TBq of cesium. The traps, which also contain residual sodium, need to be immobilized in an appropriate way to allow them to be consigned as waste packages for long term storage and, ultimately, disposal. The present paper reports on the current status of the implementation phase of immobilization, with particular reference to the work done to date on the trap accumulators, which have the most similarity with the cesium traps used at other fast reactors
    corecore